Corpus: tgl_wikipedia_2021_100K

Other corpora

4.4.1.5 Number of Word-N-grams at Sentence Endings

Number of word-N-grams for N=1...5 for the first K sentences

K # of words # of bigrams # of trigrams # of 4-grams # of 5-grams
100 95 99 99 99 99
1000 370 399 408 408 408
10000 4964 7017 8008 8237 8276
100000 32000 63033 87328 94803 96887
1000000 32000 63033 87329 94804 96888


Zipf's diagram for sentence endings


Gnuplot diagram

8828 msec needed at 2021-06-26 00:05